mdate-sh
deterministic. Original patch by Reiner Herrmann.
Kenneth J. Pronovici uploaded epydoc/3.0.1+dfsg-8 which now honors SOURCE_DATE_EPOCH. Original patch by Reiner Herrmann.
Chris Lamb submitted a patch to dh-python to make the order of the generated maintainer scripts deterministic. Chris also offered a fix for a source of non-determinism in dpkg-shlibdeps when packages have alternative dependencies.
Dhole provided a patch to add support for SOURCE_DATE_EPOCH
to gettext.
Packages fixed
The following 78 packages became reproducible in our setup due to changes in their
build dependencies:
chemical-mime-data,
clojure-contrib,
cobertura-maven-plugin,
cpm,
davical,
debian-security-support,
dfc,
diction,
dvdwizard,
galternatives,
gentlyweb-utils,
gifticlib,
gmtkbabel,
gnuplot-mode,
gplanarity,
gpodder,
gtg-trace,
gyoto,
highlight.js,
htp,
ibus-table,
impressive,
jags,
jansi-native,
jnr-constants,
jthread,
jwm,
khronos-api,
latex-coffee-stains,
latex-make,
latex2rtf,
latexdiff,
libcrcutil,
libdc0,
libdc1394-22,
libidn2-0,
libint,
libjava-jdbc-clojure,
libkryo-java,
libphone-ui-shr,
libpicocontainer-java,
libraw1394,
librostlab-blast,
librostlab,
libshevek,
libstxxl,
libtools-logging-clojure,
libtools-macro-clojure,
litl,
londonlaw,
ltsp,
macsyfinder,
mapnik,
maven-compiler-plugin,
mc,
microdc2,
miniupnpd,
monajat,
navit,
pdmenu,
pirl,
plm,
scikit-learn,
snp-sites,
sra-sdk,
sunpinyin,
tilda,
vdr-plugin-dvd,
vdr-plugin-epgsearch,
vdr-plugin-remote,
vdr-plugin-spider,
vdr-plugin-streamdev,
vdr-plugin-sudoku,
vdr-plugin-xineliboutput,
veromix,
voxbo,
xaos,
xbae.
The following packages became reproducible after getting fixed:
LC_ALL=C
when running sort
.TZ=UTC
when calling unzip
.Makefile
.debian/changelog
in version string.TZ=UTC
when calling unzip
.TZ=UTC
when calling unzip
.TZ=UTC
when calling unzip
.TZ=UTC
when calling unzip
.TZ=UTC
when calling unzip
.debian/changelog
in manpages.*.pyo
and *.pyc
from binary package.debian/changelog
when generating version strings.debian/changelog
.freebsd-hackers
mailing list. The build is run on a new virtual machine running FreeBSD 10.1 with 3 cores and 6 GB of RAM, also sponsored by Profitbricks.
strip-nondeterminism development
Andrew Ayer released version 0.009 of strip-nondeterminism. The new version will strip locales from Javadoc, include the name of files causing errors, and ignore unhandled (but rare) zip64 archives.
debbindiff development
Lunar continued its major refactoring to enhance code reuse and pave the way to fuzzy-matching and parallel processing. Most file comparators have now been converted to the new class hierarchy.
In order to support for archive formats, work has started on packaging Python bindings for libarchive. While getting support for more archive formats with a common interface is very nice, libarchive is a stream oriented library and might have bad performance with how debbindiff currently works. Time will tell if better solutions need to be found.
Documentation update
Lunar started a Reproducible builds HOWTO intended to explain the different aspects of making software build reproducibly to the different audiences that might have to get involved like software authors, producers of binary packages, and distributors.
Package reviews
17 obsolete
reviews have
been removed, 212 added and 46 updated this week.
15 new bugs for packages failing to build from sources have been reported by Chris West (Faux), and Mattia Rizzolo.
Presentations
Lunar presented Debian efforts and some recipes on making software build reproducibly at Libre Software Meeting 2015. Slides and a video recording are available.
Misc.
h01ger, dkg, and Lunar attended a Core Infrastructure Initiative meeting. The progress and tools mode for the Debian efforts were shown. Several discussions also helped getting a better understanding of the needs of other free software projects regarding reproducible builds. The idea of a global append only log, similar to the logs used for Certificate Transparency, came up on multiple occasions. Using such append only logs for keeping records of sources and build results has gotten the name Binary Transparency Logs . They would at least help identifying a compromised software signing key. Whether the benefits in using such logs justify the costs need more research.
Debian is undertaking a huge effort to develop a reproducible builds system. I'd like to thank you for that. This could be Debian's most important project, with how badly computer security has been going.
PerniciousPunk in Reddit's Ask me anything! to Neil McGovern, DPL. What happened in the reproducible builds effort this week: Toolchain fixes More tools are getting patched to use the value of the SOURCE_DATE_EPOCH environment variable as the current time:
SOURCE_DATE_EPOCH
to the time of the latest debian/changelog
entry when exporting build flags, patch sent as #791823 (Dhole),texlive-bin
(akira) and libxslt
(Dhole) with the aforementioned support for SOURCE_DATE_EPOCH
.debhelper
exported TZ=UTC
and this made packages capturing the current date (without the time) reproducible in the current test environment.
The following packages became reproducible after getting fixed:
debian/changelog
date in the manpage.debian/changelog
date as build date and use debian
as the builder hostname.debian/changelog
date as bui
ld date.reproducible.debian.net
.
Map qag.map Map qbk.map Map qcr.map Map qcs.map Map qhv.map Map qpl.map Map qtm.map Map qzc.map |
aleph aleph - *aleph.ini lamed aleph language.dat *lambda.ini |
name=polish lefthyphenmin=2 righthyphenmin=2 file=loadhyph-pl.tex file_patterns=hyph-pl.pat.txt file_exceptions=hyph-pl.hyp.txt |
deb http://people.debian.org/~preining/TeX/ exp/ deb-src http://people.debian.org/~preining/TeX/ exp/ |
deb http://people.debian.org/~preining/TeX/ exp/ deb-src http://people.debian.org/~preining/TeX/ exp/ |
CPU: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz RAM: 8 GiB - Occupying 2 slots Memory Controller Information Supported Interleave: One-way Interleave Current Interleave: One-way Interleave Maximum Memory Module Size: 8192 MB Maximum Total Memory Size: 16384 MB Handle 0x0006, DMI type 6, 12 bytes Handle 0x0007, DMI type 6, 12 bytes The usual PCI devices: rrs@learner:~$ lspci 00:00.0 Host bridge: Intel Corporation Haswell-ULT DRAM Controller (rev 0b) 00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b) 00:03.0 Audio device: Intel Corporation Haswell-ULT HD Audio Controller (rev 0b) 00:14.0 USB controller: Intel Corporation 8 Series USB xHCI HC (rev 04) 00:16.0 Communication controller: Intel Corporation 8 Series HECI #0 (rev 04) 00:1b.0 Audio device: Intel Corporation 8 Series HD Audio Controller (rev 04) 00:1c.0 PCI bridge: Intel Corporation 8 Series PCI Express Root Port 4 (rev e4) 00:1d.0 USB controller: Intel Corporation 8 Series USB EHCI #1 (rev 04) 00:1f.0 ISA bridge: Intel Corporation 8 Series LPC Controller (rev 04) 00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04) 00:1f.3 SMBus: Intel Corporation 8 Series SMBus Controller (rev 04) 01:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8723BE PCIe Wireless Network Adapter 17:37 And the storage devices Device Model: WDC WD5000M22K-24Z1LT0-SSHD-16GB Device Model: KINGSTON SM2280S3120GStorage
The drive runs into serious performance problems when its SSHD's NCQ (mis)feature is under use in Linux <= 4.0.
[28974.232550] ata2.00: configured for UDMA/133 [28974.232565] ahci 0000:00:1f.2: port does not support device sleep [28983.680955] ata1.00: exception Emask 0x10 SAct 0x7fffffff SErr 0x400100 action 0x6 frozen [28983.681000] ata1.00: irq_stat 0x08000000, interface fatal error [28983.681027] ata1: SError: UnrecovData Handshk [28983.681052] ata1.00: failed command: WRITE FPDMA QUEUED [28983.681082] ata1.00: cmd 61/40:00:b8:84:88/05:00:0a:00:00/40 tag 0 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.681152] ata1.00: status: DRDY [28983.681171] ata1.00: failed command: WRITE FPDMA QUEUED [28983.681202] ata1.00: cmd 61/40:08:f8:89:88/05:00:0a:00:00/40 tag 1 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.681271] ata1.00: status: DRDY [28983.681289] ata1.00: failed command: WRITE FPDMA QUEUED [28983.681316] ata1.00: cmd 61/40:10:38:8f:88/05:00:0a:00:00/40 tag 2 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.681387] ata1.00: status: DRDY [28983.681407] ata1.00: failed command: WRITE FPDMA QUEUED [28983.681435] ata1.00: cmd 61/40:18:78:94:88/05:00:0a:00:00/40 tag 3 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697642] ata1.00: status: DRDY [28983.697643] ata1.00: failed command: WRITE FPDMA QUEUED [28983.697646] ata1.00: cmd 61/40:c8:38:65:88/05:00:0a:00:00/40 tag 25 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697647] ata1.00: status: DRDY [28983.697648] ata1.00: failed command: WRITE FPDMA QUEUED [28983.697651] ata1.00: cmd 61/40:d0:78:6a:88/05:00:0a:00:00/40 tag 26 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697651] ata1.00: status: DRDY [28983.697652] ata1.00: failed command: WRITE FPDMA QUEUED [28983.697656] ata1.00: cmd 61/40:d8:b8:6f:88/05:00:0a:00:00/40 tag 27 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697657] ata1.00: status: DRDY [28983.697658] ata1.00: failed command: WRITE FPDMA QUEUED [28983.697661] ata1.00: cmd 61/40:e0:f8:74:88/05:00:0a:00:00/40 tag 28 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697662] ata1.00: status: DRDY [28983.697663] ata1.00: failed command: WRITE FPDMA QUEUED [28983.697666] ata1.00: cmd 61/40:e8:38:7a:88/05:00:0a:00:00/40 tag 29 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697667] ata1.00: status: DRDY [28983.697668] ata1.00: failed command: WRITE FPDMA QUEUED [28983.697672] ata1.00: cmd 61/40:f0:78:7f:88/05:00:0a:00:00/40 tag 30 ncq 688128 out res 40/00:3c:78:a9:88/00:00:0a:00:00/40 Emask 0x10 (ATA bus error) [28983.697672] ata1.00: status: DRDY [28983.697676] ata1: hard resetting link [28984.017356] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [28984.022612] ata1.00: configured for UDMA/133 [28984.022740] ata1: EH complete [28991.611732] Suspending console(s) (use no_console_suspend to debug) [28992.183822] sd 1:0:0:0: [sdb] Synchronizing SCSI cache [28992.186569] sd 1:0:0:0: [sdb] Stopping disk [28992.186604] sd 0:0:0:0: [sda] Synchronizing SCSI cache [28992.189594] sd 0:0:0:0: [sda] Stopping disk [28992.967426] PM: suspend of devices complete after 1351.349 msecs [28992.999461] PM: late suspend of devices complete after 31.990 msecs [28993.000058] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI [28993.000306] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI [28993.016463] PM: noirq suspend of devices complete after 16.978 msecs [28993.017024] ACPI: Preparing to enter system sleep state S3 [28993.017349] PM: Saving platform NVS memory [28993.017357] Disabling non-boot CPUs ... [28993.017389] intel_pstate CPU 1 exiting [28993.018727] kvm: disabling virtualization on CPU1 [28993.019320] smpboot: CPU 1 is now offline [28993.019646] intel_pstate CPU 2 exiting
In the interim, to overcome this problem, we can force the device to run in degraded mode. I'm not sure if it is really the degraded mode, or the device was falsely advertised as a 6 GiB capable device. Time will tell, but for now, force it to run in 3 GiB mode, and so far, I haven't run into the above mentioned probems. To force 3 GiB speed, apply the following.
rrs@learner:~$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.0.4+ root=/dev/mapper/sdb_crypt ro cgroup_enable=memory swapaccount=1 rootflags=data=writeback libata.force=1:3 quiet
16:42
And then verify it... As you can see below, I've forced it for ata1 because I want my SSD drive to run at full-speed. I've done enough I/O, which earlier resulted in the kernel spitting the SATA errors. With this workaround, the kernel does not spit any error messages.
[ 1.273365] libata version 3.00 loaded. [ 1.287290] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 4 ports 6 Gbps 0x3 impl SATA mode [ 1.288238] ata1: FORCE: PHY spd limit set to 3.0Gbps [ 1.288240] ata1: SATA max UDMA/133 abar m2048@0xb051b000 port 0xb051b100 irq 41 [ 1.288242] ata2: SATA max UDMA/133 abar m2048@0xb051b000 port 0xb051b180 irq 41 [ 1.288244] ata3: DUMMY [ 1.288245] ata4: DUMMY [ 1.606971] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 1.607906] ata1.00: ATA-9: WDC WD5000M22K-24Z1LT0-SSHD-16GB, 02.01A03, max UDMA/133 [ 1.607910] ata1.00: 976773168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA [ 1.608856] ata1.00: configured for UDMA/133 [ 1.609106] scsi 0:0:0:0: Direct-Access ATA WDC WD5000M22K-2 1A03 PQ: 0 ANSI: 5 [ 1.927167] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 1.928980] ata2.00: ATA-8: KINGSTON SM2280S3120G, S8FM06.A, max UDMA/133 [ 1.928983] ata2.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA [ 1.929616] ata2.00: configured for UDMA/133
And the throughput you get out of your WD SATA SSHD drive, with capability set to 3.0 GiB is:
rrs@learner:/media/SSHD/tmp$ while true; do dd if=/dev/zero of=foo.img bs=1M count=20000; sync; rm -rf foo.img; sync; done 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 202.014 s, 104 MB/s 20000+0 records in 20000+0 records out 20971520000 bytes (21 GB) copied, 206.111 s, 102 MB/s
Hannes Reinecke has submitted patches for NCQ enhancements, for Linux 4.1, which I hope will resolve these problems. Another option is to disable NCQ for the drive, or else blacklist the make/model in driver/ata/libata-core.c
By the time I finished this blog entry draft, I had tests to conclude that this did not look like an NCQ problem. Because in degraded mode too, it runs with NCQ enabled (check above).
rrs@learner:~$ sudo fstrim -vv /media/SSHD /media/SSHD: 268.2 GiB (287930949632 bytes) trimmed 16:58 rrs@learner:~$ sudo fstrim -vv / [sudo] password for rrs: /: 64 GiB (68650749952 bytes) trimmed 16:56
Another interesting feature of this drive is support for TRIM / DISCARD. This drive's FTL accepts the TRIM command. Ofcourse, you need to ensure that you have discard enabled in all the layers. In my case, SATA + Device Mapper (Crypt and LVM) + File System (ext4) Display
The overall display of this device is amazing. It is large enough to give you vibrant look. At 1920x1080 resolution, things look good. The display support was available out-of-the-box.
There were some suspend / resume hangs that occured with kernels < 4.x, during suspend / resume. The issue was root caused and fixed for Linux 4.0.
You may still notice the following kernel messages, though not problematic to me so far.
[28977.518114] PM: thaw of devices complete after 3607.979 msecs
[28977.590389] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[28977.590582] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[28977.591095] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[28977.591185] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[28977.591368] acpi device:30: Cannot transition to power state D3cold for parent in (unknown)
[28977.591911] pci_bus 0000:01: Allocating resources
[28977.591933] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[28977.592093] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
[28977.592401] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment
You may need to disable the Intel Management Engine Interface (mei.ko), incase you run into suspend/resume problems.
rrs@learner:/media/SSHD/tmp$ cat /etc/modprobe.d/intel-mei-blacklist.conf blacklist mei blacklist mei-me 17:01
You may also run into the following Kernel Oops during suspend/resume. Below, you see 2 interation of sleep because it first hibernates and then sleeps (s2both).
[ 180.470206] Syncing filesystems ... done. [ 180.473337] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 180.475210] PM: Marking nosave pages: [mem 0x00000000-0x00000fff] [ 180.475213] PM: Marking nosave pages: [mem 0x0006f000-0x0006ffff] [ 180.475215] PM: Marking nosave pages: [mem 0x00088000-0x000fffff] [ 180.475220] PM: Marking nosave pages: [mem 0x97360000-0x97b5ffff] [ 180.475274] PM: Marking nosave pages: [mem 0x9c36f000-0x9cffefff] [ 180.475356] PM: Marking nosave pages: [mem 0x9d000000-0xffffffff] [ 180.476877] PM: Basic memory bitmaps created [ 180.477003] PM: Preallocating image memory... done (allocated 380227 pages) [ 180.851800] PM: Allocated 1520908 kbytes in 0.37 seconds (4110.56 MB/s) [ 180.851802] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 180.853355] Suspending console(s) (use no_console_suspend to debug) [ 180.853520] wlan0: deauthenticating from c4:6e:1f:d0:67:26 by local choice (Reason: 3=DEAUTH_LEAVING) [ 180.864159] cfg80211: Calling CRDA to update world regulatory domain [ 181.172222] PM: freeze of devices complete after 319.294 msecs [ 181.196080] ------------[ cut here ]------------ [ 181.196124] WARNING: CPU: 3 PID: 3707 at drivers/gpu/drm/i915/intel_display.c:7904 hsw_enable_pc8+0x659/0x7c0 [i915]() [ 181.196125] SPLL enabled [ 181.196159] Modules linked in: rfcomm ctr ccm bnep pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bridge stp llc xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_CHECKSUM xt_tcpudp iptable_mangle ip_tables x_tables nls_utf8 nls_cp437 vfat fat rtsx_usb_ms memstick snd_hda_codec_hdmi joydev mousedev hid_sensor_rotation hid_sensor_incl_3d hid_sensor_als hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common iTCO_wdt iTCO_vendor_support hid_multitouch x86_pkg_temp_thermal intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm btusb hid_sensor_hub bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops [ 181.196203] videobuf2_core v4l2_common videodev media pcspkr evdev mac_hid arc4 psmouse serio_raw efivars i2c_i801 rtl8723be btcoexist rtl8723_common rtl_pci rtlwifi mac80211 snd_soc_rt5640 cfg80211 snd_soc_rl6231 snd_hda_codec_realtek i915 snd_soc_core snd_hda_codec_generic ideapad_laptop ac snd_compress dw_dmac sparse_keymap drm_kms_helper rfkill battery dw_dmac_core snd_hda_intel snd_pcm_dmaengine snd_soc_sst_acpi snd_hda_controller video 8250_dw regmap_i2c snd_hda_codec drm snd_hwdep snd_pcm spi_pxa2xx_platform i2c_designware_platform soc_button_array snd_timer i2c_designware_core snd i2c_algo_bit soundcore shpchp lpc_ich button processor fuse ipv6 autofs4 ext4 crc16 jbd2 mbcache btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod sg usbhid sd_mod rtsx_usb_sdmmc rtsx_usb crct10dif_pclmul [ 181.196220] crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ahci libahci libata xhci_pci ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common thermal fan thermal_sys hwmon i2c_hid hid i2c_core sdhci_acpi sdhci mmc_core gpio_lynxpoint [ 181.196224] CPU: 3 PID: 3707 Comm: kworker/u16:7 Tainted: G O 4.0.4+ #14 [ 181.196225] Hardware name: LENOVO 20344/INVALID, BIOS 96CN29WW(V1.15) 10/16/2014 [ 181.196230] Workqueue: events_unbound async_run_entry_fn [ 181.196233] 0000000000000000 ffffffffa0706f68 ffffffff81522198 ffff880064debc88 [ 181.196235] ffffffff8106c5b1 ffff880251460000 ffff880250f83b68 ffff880250f83b78 [ 181.196237] ffff880250f83800 0000000000000001 ffffffff8106c62a ffffffffa071407c [ 181.196238] Call Trace: [ 181.196248] [<ffffffff81522198>] ? dump_stack+0x40/0x50 [ 181.196251] [<ffffffff8106c5b1>] ? warn_slowpath_common+0x81/0xb0 [ 181.196254] [<ffffffff8106c62a>] ? warn_slowpath_fmt+0x4a/0x50 [ 181.196278] [<ffffffffa06ae349>] ? hsw_enable_pc8+0x659/0x7c0 [i915] [ 181.196289] [<ffffffffa0643ee0>] ? intel_suspend_complete+0xe0/0x6e0 [i915] [ 181.196300] [<ffffffffa0644501>] ? i915_drm_suspend_late+0x21/0x90 [i915] [ 181.196311] [<ffffffffa0644690>] ? i915_pm_poweroff_late+0x40/0x40 [i915] [ 181.196318] [<ffffffff813fa7ba>] ? dpm_run_callback+0x4a/0x100 [ 181.196321] [<ffffffff813fb010>] ? __device_suspend_late+0xa0/0x180 [ 181.196324] [<ffffffff813fb10e>] ? async_suspend_late+0x1e/0xa0 [ 181.196326] [<ffffffff8108b973>] ? async_run_entry_fn+0x43/0x160 [ 181.196330] [<ffffffff81083a5d>] ? process_one_work+0x14d/0x3f0 [ 181.196332] [<ffffffff81084463>] ? worker_thread+0x53/0x480 [ 181.196334] [<ffffffff81084410>] ? rescuer_thread+0x300/0x300 [ 181.196338] [<ffffffff81089191>] ? kthread+0xc1/0xe0 [ 181.196341] [<ffffffff810890d0>] ? kthread_create_on_node+0x180/0x180 [ 181.196346] [<ffffffff81527898>] ? ret_from_fork+0x58/0x90 [ 181.196349] [<ffffffff810890d0>] ? kthread_create_on_node+0x180/0x180 [ 181.196350] ---[ end trace 8e339004db298838 ]--- [ 181.220094] PM: late freeze of devices complete after 47.936 msecs [ 181.220972] PM: noirq freeze of devices complete after 0.875 msecs [ 181.221577] ACPI: Preparing to enter system sleep state S4 [ 181.221886] PM: Saving platform NVS memory [ 181.222702] Disabling non-boot CPUs ... [ 181.222731] intel_pstate CPU 1 exiting [ 181.224041] kvm: disabling virtualization on CPU1 [ 181.224680] smpboot: CPU 1 is now offline [ 181.225121] intel_pstate CPU 2 exiting [ 181.226407] kvm: disabling virtualization on CPU2 [ 181.227025] smpboot: CPU 2 is now offline [ 181.227441] intel_pstate CPU 3 exiting [ 181.227728] Broke affinity for irq 19 [ 181.227747] Broke affinity for irq 41 [ 181.228771] kvm: disabling virtualization on CPU3 [ 181.228793] smpboot: CPU 3 is now offline [ 181.229624] PM: Creating hibernation image: [ 181.563651] PM: Need to copy 379053 pages [ 181.563655] PM: Normal pages needed: 379053 + 1024, available pages: 1697704 [ 182.472910] PM: Hibernation image created (379053 pages copied) [ 181.232347] PM: Restoring platform NVS memory [ 181.233171] Enabling non-boot CPUs ... [ 181.233246] x86: Booting SMP configuration: [ 181.233248] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 181.246771] kvm: enabling virtualization on CPU1 [ 181.249339] CPU1 is up [ 181.249389] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 181.262313] kvm: enabling virtualization on CPU2 [ 181.264853] CPU2 is up [ 181.264903] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 181.277831] kvm: enabling virtualization on CPU3 [ 181.280317] CPU3 is up [ 181.288471] ACPI: Waking up from system sleep state S4 [ 182.340655] PM: noirq thaw of devices complete after 0.637 msecs [ 182.378087] PM: early thaw of devices complete after 37.428 msecs [ 182.378436] rtlwifi: rtlwifi: wireless switch is on [ 182.451021] rtc_cmos 00:01: System wakeup disabled by ACPI [ 182.697575] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 182.697617] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 182.699248] ata1.00: configured for UDMA/133 [ 182.699911] ata2.00: configured for UDMA/133 [ 182.699917] ahci 0000:00:1f.2: port does not support device sleep [ 186.059539] PM: thaw of devices complete after 3685.338 msecs [ 186.134292] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 186.134479] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 186.134992] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 186.135080] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 186.135266] acpi device:30: Cannot transition to power state D3cold for parent in (unknown) [ 186.135950] pci_bus 0000:01: Allocating resources [ 186.135974] pcieport 0000:00:1c.0: bridge window [mem 0x00100000-0x000fffff 64bit pref] to [bus 01] add_size 200000 [ 186.135980] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 186.136049] pcieport 0000:00:1c.0: res[15]=[mem 0x00100000-0x000fffff 64bit pref] get_res_add_size add_size 200000 [ 186.136072] pcieport 0000:00:1c.0: BAR 15: assigned [mem 0x9fb00000-0x9fcfffff 64bit pref] [ 186.136174] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 186.136490] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 199.454497] Suspending console(s) (use no_console_suspend to debug) [ 200.024190] sd 1:0:0:0: [sdb] Synchronizing SCSI cache [ 200.024356] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 200.025359] sd 1:0:0:0: [sdb] Stopping disk [ 200.028701] sd 0:0:0:0: [sda] Stopping disk [ 201.106085] PM: suspend of devices complete after 1651.336 msecs [ 201.106591] ------------[ cut here ]------------ [ 201.106628] WARNING: CPU: 0 PID: 3725 at drivers/gpu/drm/i915/intel_display.c:7904 hsw_enable_pc8+0x659/0x7c0 [i915]() [ 201.106628] SPLL enabled [ 201.106656] Modules linked in: rfcomm ctr ccm bnep pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bridge stp llc xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_CHECKSUM xt_tcpudp iptable_mangle ip_tables x_tables nls_utf8 nls_cp437 vfat fat rtsx_usb_ms memstick snd_hda_codec_hdmi joydev mousedev hid_sensor_rotation hid_sensor_incl_3d hid_sensor_als hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common iTCO_wdt iTCO_vendor_support hid_multitouch x86_pkg_temp_thermal intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm btusb hid_sensor_hub bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops [ 201.106694] videobuf2_core v4l2_common videodev media pcspkr evdev mac_hid arc4 psmouse serio_raw efivars i2c_i801 rtl8723be btcoexist rtl8723_common rtl_pci rtlwifi mac80211 snd_soc_rt5640 cfg80211 snd_soc_rl6231 snd_hda_codec_realtek i915 snd_soc_core snd_hda_codec_generic ideapad_laptop ac snd_compress dw_dmac sparse_keymap drm_kms_helper rfkill battery dw_dmac_core snd_hda_intel snd_pcm_dmaengine snd_soc_sst_acpi snd_hda_controller video 8250_dw regmap_i2c snd_hda_codec drm snd_hwdep snd_pcm spi_pxa2xx_platform i2c_designware_platform soc_button_array snd_timer i2c_designware_core snd i2c_algo_bit soundcore shpchp lpc_ich button processor fuse ipv6 autofs4 ext4 crc16 jbd2 mbcache btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod sg usbhid sd_mod rtsx_usb_sdmmc rtsx_usb crct10dif_pclmul [ 201.106711] crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ahci libahci libata xhci_pci ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common thermal fan thermal_sys hwmon i2c_hid hid i2c_core sdhci_acpi sdhci mmc_core gpio_lynxpoint [ 201.106714] CPU: 0 PID: 3725 Comm: kworker/u16:25 Tainted: G W O 4.0.4+ #14 [ 201.106715] Hardware name: LENOVO 20344/INVALID, BIOS 96CN29WW(V1.15) 10/16/2014 [ 201.106720] Workqueue: events_unbound async_run_entry_fn [ 201.106723] 0000000000000000 ffffffffa0706f68 ffffffff81522198 ffff880064dd7c88 [ 201.106725] ffffffff8106c5b1 ffff880251460000 ffff880250f83b68 ffff880250f83b78 [ 201.106727] ffff880250f83800 0000000000000002 ffffffff8106c62a ffffffffa071407c [ 201.106728] Call Trace: [ 201.106737] [<ffffffff81522198>] ? dump_stack+0x40/0x50 [ 201.106740] [<ffffffff8106c5b1>] ? warn_slowpath_common+0x81/0xb0 [ 201.106742] [<ffffffff8106c62a>] ? warn_slowpath_fmt+0x4a/0x50 [ 201.106765] [<ffffffffa06ae349>] ? hsw_enable_pc8+0x659/0x7c0 [i915] [ 201.106776] [<ffffffffa0643ee0>] ? intel_suspend_complete+0xe0/0x6e0 [i915] [ 201.106786] [<ffffffffa0644501>] ? i915_drm_suspend_late+0x21/0x90 [i915] [ 201.106797] [<ffffffffa0644690>] ? i915_pm_poweroff_late+0x40/0x40 [i915] [ 201.106802] [<ffffffff813fa7ba>] ? dpm_run_callback+0x4a/0x100 [ 201.106805] [<ffffffff813fb010>] ? __device_suspend_late+0xa0/0x180 [ 201.106809] [<ffffffff813fb10e>] ? async_suspend_late+0x1e/0xa0 [ 201.106811] [<ffffffff8108b973>] ? async_run_entry_fn+0x43/0x160 [ 201.106813] [<ffffffff81083a5d>] ? process_one_work+0x14d/0x3f0 [ 201.106815] [<ffffffff81084463>] ? worker_thread+0x53/0x480 [ 201.106818] [<ffffffff81084410>] ? rescuer_thread+0x300/0x300 [ 201.106821] [<ffffffff81089191>] ? kthread+0xc1/0xe0 [ 201.106824] [<ffffffff810890d0>] ? kthread_create_on_node+0x180/0x180 [ 201.106827] [<ffffffff81527898>] ? ret_from_fork+0x58/0x90 [ 201.106830] [<ffffffff810890d0>] ? kthread_create_on_node+0x180/0x180 [ 201.106832] ---[ end trace 8e339004db298839 ]--- [ 201.130052] PM: late suspend of devices complete after 23.960 msecs [ 201.130725] ehci-pci 0000:00:1d.0: System wakeup enabled by ACPI [ 201.130885] xhci_hcd 0000:00:14.0: System wakeup enabled by ACPI [ 201.146986] PM: noirq suspend of devices complete after 16.930 msecs [ 201.147591] ACPI: Preparing to enter system sleep state S3 [ 201.147942] PM: Saving platform NVS memory [ 201.147948] Disabling non-boot CPUs ... [ 201.147999] intel_pstate CPU 1 exiting [ 201.149324] kvm: disabling virtualization on CPU1 [ 201.149337] smpboot: CPU 1 is now offline [ 201.149640] intel_pstate CPU 2 exiting [ 201.151096] kvm: disabling virtualization on CPU2 [ 201.151108] smpboot: CPU 2 is now offline [ 201.152017] intel_pstate CPU 3 exiting [ 201.153250] kvm: disabling virtualization on CPU3 [ 201.153256] smpboot: CPU 3 is now offline [ 201.156229] ACPI: Low-level resume complete [ 201.156307] PM: Restoring platform NVS memory [ 201.160033] CPU0 microcode updated early to revision 0x1c, date = 2014-07-03 [ 201.160190] Enabling non-boot CPUs ... [ 201.160241] x86: Booting SMP configuration: [ 201.160243] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 201.172665] kvm: enabling virtualization on CPU1 [ 201.174982] CPU1 is up [ 201.175013] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 201.187569] CPU2 microcode updated early to revision 0x1c, date = 2014-07-03 [ 201.188796] kvm: enabling virtualization on CPU2 [ 201.191130] CPU2 is up [ 201.191158] smpboot: Booting Node 0 Processor 3 APIC 0x3 [ 201.203297] kvm: enabling virtualization on CPU3 [ 201.205679] CPU3 is up [ 201.210414] ACPI: Waking up from system sleep state S3 [ 201.224617] ehci-pci 0000:00:1d.0: System wakeup disabled by ACPI [ 201.332523] xhci_hcd 0000:00:14.0: System wakeup disabled by ACPI [ 201.332634] PM: noirq resume of devices complete after 121.623 msecs [ 201.372718] PM: early resume of devices complete after 40.058 msecs [ 201.372892] rtlwifi: rtlwifi: wireless switch is on [ 201.373270] sd 0:0:0:0: [sda] Starting disk [ 201.373271] sd 1:0:0:0: [sdb] Starting disk [ 201.445954] rtc_cmos 00:01: System wakeup disabled by ACPI [ 201.692510] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300) [ 201.694719] ata2.00: configured for UDMA/133 [ 201.694724] ahci 0000:00:1f.2: port does not support device sleep [ 201.836724] usb 2-4: reset high-speed USB device number 2 using xhci_hcd [ 201.890158] psmouse serio1: synaptics: queried max coordinates: x [..5702], y [..4730] [ 201.930768] psmouse serio1: synaptics: queried min coordinates: x [1242..], y [1124..] [ 202.076784] usb 2-5: reset full-speed USB device number 3 using xhci_hcd [ 202.205100] usb 2-5: ep 0x2 - rounding interval to 64 microframes, ep desc says 80 microframes [ 202.316799] usb 2-7: reset full-speed USB device number 5 using xhci_hcd [ 202.444945] usb 2-7: No LPM exit latency info found, disabling LPM. [ 202.556817] usb 2-8: reset full-speed USB device number 6 using xhci_hcd [ 202.908691] usb 2-6: reset high-speed USB device number 4 using xhci_hcd [ 203.932602] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320) [ 204.044890] ata1.00: configured for UDMA/133 [ 206.228698] PM: resume of devices complete after 4855.892 msecs [ 206.380738] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.383152] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.385775] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.388066] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.390415] acpi device:30: Cannot transition to power state D3cold for parent in (unknown) [ 206.393078] pci_bus 0000:01: Allocating resources [ 206.393098] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.395470] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.397927] i915 0000:00:02.0: BAR 6: [??? 0x00000000 flags 0x2] has bogus alignment [ 206.518516] Restarting kernel threads ... done. [ 206.518812] PM: Basic memory bitmaps freed [ 206.518816] Restarting tasks ... done.
There is one more occasional Kernel Oops (below), which I believe again has to do with Intel.
[ 8770.745396] ------------[ cut here ]------------ [ 8770.745441] WARNING: CPU: 0 PID: 7206 at drivers/gpu/drm/i915/intel_display.c:9756 intel_check_page_flip+0xd2/0xe0 [i915]() [ 8770.745444] Kicking stuck page flip: queued at 466186, now 466191 [ 8770.745445] Modules linked in: cpuid rfcomm ctr ccm bnep pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) bridge stp llc xt_conntrack iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_CHECKSUM xt_tcpudp iptable_mangle ip_tables x_tables nls_utf8 nls_cp437 vfat fat rtsx_usb_ms memstick snd_hda_codec_hdmi joydev mousedev hid_sensor_rotation hid_sensor_incl_3d hid_sensor_als hid_sensor_accel_3d hid_sensor_magn_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf industrialio hid_sensor_iio_common iTCO_wdt iTCO_vendor_support hid_multitouch x86_pkg_temp_thermal intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm btusb hid_sensor_hub bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops [ 8770.745484] videobuf2_core v4l2_common videodev media pcspkr evdev mac_hid arc4 psmouse serio_raw efivars i2c_i801 rtl8723be btcoexist rtl8723_common rtl_pci rtlwifi mac80211 snd_soc_rt5640 cfg80211 snd_soc_rl6231 snd_hda_codec_realtek i915 snd_soc_core snd_hda_codec_generic ideapad_laptop ac snd_compress dw_dmac sparse_keymap drm_kms_helper rfkill battery dw_dmac_core snd_hda_intel snd_pcm_dmaengine snd_soc_sst_acpi snd_hda_controller video 8250_dw regmap_i2c snd_hda_codec drm snd_hwdep snd_pcm spi_pxa2xx_platform i2c_designware_platform soc_button_array snd_timer i2c_designware_core snd i2c_algo_bit soundcore shpchp lpc_ich button processor fuse ipv6 autofs4 ext4 crc16 jbd2 mbcache btrfs xor raid6_pq algif_skcipher af_alg dm_crypt dm_mod sg usbhid sd_mod rtsx_usb_sdmmc rtsx_usb crct10dif_pclmul [ 8770.745536] crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd ahci libahci libata xhci_pci ehci_pci xhci_hcd ehci_hcd scsi_mod usbcore usb_common thermal fan thermal_sys hwmon i2c_hid hid i2c_core sdhci_acpi sdhci mmc_core gpio_lynxpoint [ 8770.745561] CPU: 0 PID: 7206 Comm: icedove Tainted: G W O 4.0.4+ #14 [ 8770.745563] Hardware name: LENOVO 20344/INVALID, BIOS 96CN29WW(V1.15) 10/16/2014 [ 8770.745565] 0000000000000000 ffffffffa0706f68 ffffffff81522198 ffff88025f203dc8 [ 8770.745569] ffffffff8106c5b1 ffff880250f83800 ffff880254dcc000 0000000000000000 [ 8770.745572] 0000000000000000 0000000000000000 ffffffff8106c62a ffffffffa0709d50 [ 8770.745575] Call Trace: [ 8770.745577] <IRQ> [<ffffffff81522198>] ? dump_stack+0x40/0x50 [ 8770.745592] [<ffffffff8106c5b1>] ? warn_slowpath_common+0x81/0xb0 [ 8770.745595] [<ffffffff8106c62a>] ? warn_slowpath_fmt+0x4a/0x50 [ 8770.745616] [<ffffffffa06a0bb3>] ? __intel_pageflip_stall_check+0x113/0x120 [i915] [ 8770.745634] [<ffffffffa06af042>] ? intel_check_page_flip+0xd2/0xe0 [i915] [ 8770.745652] [<ffffffffa067cde1>] ? ironlake_irq_handler+0x2e1/0x1010 [i915] [ 8770.745657] [<ffffffff81092d1a>] ? check_preempt_curr+0x5a/0xa0 [ 8770.745663] [<ffffffff812d66c2>] ? timerqueue_del+0x22/0x70 [ 8770.745668] [<ffffffff810bb7d5>] ? handle_irq_event_percpu+0x75/0x190 [ 8770.745672] [<ffffffff8101b945>] ? read_tsc+0x5/0x10 [ 8770.745676] [<ffffffff810bb928>] ? handle_irq_event+0x38/0x50 [ 8770.745680] [<ffffffff810be841>] ? handle_edge_irq+0x71/0x120 [ 8770.745685] [<ffffffff810153bd>] ? handle_irq+0x1d/0x30 [ 8770.745689] [<ffffffff8152a866>] ? do_IRQ+0x46/0xe0 [ 8770.745694] [<ffffffff8152866d>] ? common_interrupt+0x6d/0x6d [ 8770.745695] <EOI> [<ffffffff8152794d>] ? system_call_fastpath+0x16/0x1b [ 8770.745701] ---[ end trace 8e339004db29883a ]---Network
In my case, the laptop came with the Realtek Wireless device (details above in lspci output). Note: The machine has no wired interface.
While the Intel Wifi devices shipped with this laptop have their own share of problems, this device (rtl8723be) works out of the box. But only for a while. There is no certain pattern on what triggers the bug, but once triggered, the network just freezes. Nothing is logged.
If your Yoga 2 13 came with the RTL chip, the following workaround may help avoid the network issues.
rrs@learner:/media/SSHD/tmp$ cat /etc/modprobe.d/rtl8723be.conf options rtl8723be fwlps=0 17:06MCE
Almost every boot, eventually, the kernel reports MCE errors. Not something I understand well, but so far, it hasn't caused any visible issues. And from what I have googled so far, nobody seems to have fixed it anywhere
So, with fingers crossed, lets just hope this never translates into a real problem.
What the kernel reports of the CPU's capabilities.
[ 0.041496] mce: CPU supports 7 MCE banks [ 299.540930] mce: [Hardware Error]: Machine check events logged
The MCE logs extracted from the buffer.
mcelog: failed to prefill DIMM database from DMI data Hardware event. This is not a software error. MCE 0 CPU 0 BANK 5 MISC 38a0000086 ADDR fef81880 TIME 1432455005 Sun May 24 13:40:05 2015 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee0000000040110a MCGSTATUS 0 MCGCAP c07 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 69 Hardware event. This is not a software error. MCE 1 CPU 0 BANK 6 MISC 78a0000086 ADDR fef81780 TIME 1432455005 Sun May 24 13:40:05 2015 MCG status: MCi status: Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ae0000000040110a MCGSTATUS 0 MCGCAP c07 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 69 Hardware event. This is not a software error. MCE 2 CPU 0 BANK 5 MISC 38a0000086 ADDR fef81880 TIME 1432455114 Sun May 24 13:41:54 2015 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ee0000000040110a MCGSTATUS 0 MCGCAP c07 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 69 Hardware event. This is not a software error. MCE 3 CPU 0 BANK 6 MISC 78a0000086 ADDR fef81780 TIME 1432455114 Sun May 24 13:41:54 2015 MCG status: MCi status: Uncorrected error MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: corrected filtering (some unreported errors in same region) Generic CACHE Level-2 Generic Error STATUS ae0000000040110a MCGSTATUS 0 MCGCAP c07 APICID 0 SOCKETID 0 CPUID Vendor Intel Family 6 Model 69
apt-get dist-upgrade
, synced my
git trees, etc. Everything was fine. The next day, in the airport
between two flights, the laptop doesn't boot - doesn't see the SSD at
all. I tried rebooting a gazillion times, nothing. I was quite upset -
at the hardware, and at me for not paying attention to the
unreliability signs before.
Once I arrived at the destination, I opened the laptop, tried
re-seating the SSD, nothing. I bought a SATA-to-USB bridge, and
surprise! Boots from the first, no issues. Diagnosis A: Laptop SATA
connector has issues.
I work with this SATA-to-USB bridge for a couple of days, but it was
quite slow (~20MB/s), so I buy a SATA-to-USB3 cradle, which should be
much faster. But the SSD was not visible in this cradle. Not only
that, but it was causing the laptop to hang in the POST screen -
reliably. Turn the cradle off, the laptop passes POST, turn it on, the
laptop took 2 minutes to pass the POST. OK, the cradle is
broken. Connect the SSD back to the USB2 thing not booting For about
five minutes, it was like "dead". After that, it booted and behaved
normally. I didn't know what to think, I just put it aside. So worked
for the rest of the week on the USB2 bridge, with no issues (once it
the SSD dropped off, but I think that was just USB being USB). So at
the end of the week, diagnosis (A) still was the main contender.
On the flight back home, I worked from the plane for a good number of
hours, again no issue. Laptop/SSD were fully powered off before the
flight, powered with no issues, worked fine. At the end of the flight
I completely shut down my laptop. Diagnosis A still on top.
Real trouble now
After getting home and sleeping a bit, I wanted to power up the laptop
just to transfer the code I wrote on the plane. But it didn't power
up. No problem, I said, now I actually have access to running Linux
machines and I can check what's happening. And to my surprise, the SSD
was behaving erratically:
kernel: scsi 21:0:0:0: Direct-Access USB TO I DE/SATA
Device 0008 PQ: 0 ANSI: 0
kernel: scsi 22:0:0:0: Direct-Access Samsunp w 0 EVO 500G
0008 PQ: 0 ANSI: 0
, but with lots of ATA errorscfdisk /dev/sdb
could take >5 seconds before showing
the screen20:22:57 kernel: ata2.00: ATA-9: Samsung SSD 840 EVO 500GB, EXT0BB6Q, max UDMA/133
20:22:57 kernel: ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
20:22:57 kernel: ata2.00: configured for UDMA/133
20:22:57 kernel: scsi 3:0:0:0: Direct-Access ATA Samsung SSD 840 EXT0 PQ: 0 ANSI: 5
20:22:57 kernel: sd 3:0:0:0: [sdc] 488397168 512-byte logical blocks: (250 GB/232 GiB)
20:22:57 kernel: sd 3:0:0:0: [sdc] Write Protect is off
20:22:57 kernel: sd 3:0:0:0: [sdc] Mode Sense: 00 3a 00 00
20:22:57 kernel: sd 3:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
20:22:57 kernel: ata2.00: exception Emask 0x0 SAct 0x2 SErr 0x400001 action 0x6 frozen
20:22:57 kernel: ata2: SError: RecovData Handshk
20:22:57 kernel: ata2.00: failed command: READ FPDMA QUEUED
20:22:57 kernel: ata2.00: cmd 60/08:08:68:01:00/00:00:00:00:00/40 tag 1 ncq 4096 in
20:22:57 kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
20:22:57 kernel: ata2.00: status: DRDY
20:22:57 kernel: ata2: hard resetting link
20:22:57 kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
20:22:57 kernel: ata2.00: failed to get NCQ Send/Recv Log Emask 0x1
20:22:57 kernel: ata2.00: failed to get NCQ Send/Recv Log Emask 0x1
20:22:57 kernel: ata2.00: configured for UDMA/133
20:22:57 kernel: ata2.00: device reported invalid CHS sector 0
20:22:57 kernel: ata2: EH complete
20:22:57 kernel: ata2.00: exception Emask 0x0 SAct 0x400000 SErr 0x400001 action 0x6
20:22:57 kernel: ata2.00: irq_stat 0x44000008
20:22:57 kernel: ata2: SError: RecovData Handshk
20:22:57 kernel: ata2.00: failed command: READ FPDMA QUEUED
20:22:57 kernel: ata2.00: cmd 60/08:b0:08:03:00/00:00:00:00:00/40 tag 22 ncq 4096 in
20:22:57 kernel: res 41/84:00:08:03:00/00:00:00:00:00/00 Emask 0x410 (ATA bus error) <F>
20:22:57 kernel: ata2.00: status: DRDY ERR
20:22:57 kernel: ata2.00: error: ICRC ABRT
20:22:57 kernel: ata2: hard resetting link
20:22:57 kernel: ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
20:22:57 kernel: ata2.00: failed to get NCQ Send/Recv Log Emask 0x1
20:22:57 kernel: ata2.00: failed to get NCQ Send/Recv Log Emask 0x1
20:22:57 kernel: ata2.00: configured for UDMA/133
20:22:57 kernel: ata2: EH complete
Note that all of the messages were in short order (during boot, they
show the same timestamp but I don't think it was actually the same
second).
I tried connecting the SSD over USB2 (partially working), over USB3
(not working!), and directly over SATA (initially not working). I
connected another SSD I had around (same model, just smaller capacity)
over all three, it worked (so the USB3 bridge was working, at least).
Diagnosing (A) was out the door now, and the situation was very clear:
SSD dying/dead/almost gone. So I connected both broken SSD and empty
SSD to my workstation over SATA, and started copying data (once I
managed to boot with the old one being visible and working).
During the data copy, I saw that the "broken" SSD behaved erratically
indeed: it was copying data off it with either ~40MB/s, ~70MB/s, and
~160MB/s. Not other speed, at least not for long time, just cycling
between these three; and this is a very slow speed for this SSD
model. And then I remembered that there is, for this model (Samsung
840 Evo), an advisory/firmware fix that old data gets harder to access
(slower and slower), due to how TLC cells levels are read/etc. I don't
know exactly what "old" means, but since the partition table was
written only once, it should be the oldest thing written, which would
make it the most susceptible to the slowdown, and could explain the
cfdisk
slowness. So after the data copy, I tested this:
Failed to upgrade firmware
(or
some message like that)dd if=/dev/zero of=/dev/sdX
is too common, let's try
a builtin (ATA) erase! After fighting with hdparm
and the fact that
my BIOS does indeed "security freeze" on the drives (so you can't
change the security settings nor erase the drives), and finding an
article on the net that gives a few workaround, I manage to "unfreeze"
it by not only live unplugging the SATA cable, but also the power
cable, and plugging them back in. Time to erase.
Side-note here:
$ smartctl -a /dev/sda [...] Error 1341 occurred at disk power-on lifetime: 17614 hours (733 days + 22 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 41 02 1f c0 9c 40 Error: UNC at LBA = 0x009cc01f = 10272799 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 f8 08 20 c0 9c 40 00 41d+01:51:50.974 READ FPDMA QUEUED 60 08 00 18 c0 9c 40 00 41d+01:51:50.972 READ FPDMA QUEUED ef 10 02 00 00 00 a0 00 41d+01:51:50.972 SET FEATURES [Reserved for Serial ATA] ec 00 00 00 00 00 a0 00 41d+01:51:50.971 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 41d+01:51:50.971 SET FEATURES [Set transfer mode] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 90% 20511 156170102 [...]The status of the degraded RAID array looks like this:
$ cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sdb7[1] 409845696 blocks [2/1] [_U] md0 : active raid1 sda6[0] sdb6[1] 291776 blocks [2/2] [UU]The [_U] means that one of two disks has failed, it should normally be [UU]. There are two RAID-1s actually, a small md0 (sda6 + sdb6) for /boot and the main md1 (sda7 + sdb7) which holds the OS and my data. Apparently (at first at least), only sda7 was faulty and got kicked out of the array:
$ dmesg grep kick md: kicking non-fresh sda7 from array!Anyway, so I ordered a replacement disk, removed the dead disk (I checked the serial number and brand before, so I don't accidentally remove the wrong one), inserted the new disk and rebooted. Note: In order for this to work you have to have (previously) installed the bootloader (usually GRUB) onto both disks, otherwise you won't be able to boot from either of them (which you'll want to do if one of them dies, of course). In my case, sda was now dead, so I put sdb into its place (physically, by using the other SATA connector/port) and the new replacement disk would become the new sdb. After the reboot, the new disk needs to be partitioned like the other RAID disk. This can be done easily by copying the partition layout of the "good" disk (now sda after the reboot) onto the empty disk (sdb):
$ sfdisk -d /dev/sda sfdisk /dev/sdbSpecifically, the RAID disks/partitions need to have the type/ID "fd" ("Linux raid autodetect"), check if that is the case. Then, you can add the new disk to the RAIDs:
$ mdadm /dev/md0 --add /dev/sdb6 $ mdadm /dev/md1 --add /dev/sdb7After a few hours the RAID will be re-synced properly and all is good again. You can check the progress via:
$ watch -n 1 cat /proc/mdstatYou should probably not reboot during the resync (though I'm not 100% sure if that would be an issue in practice; please leave a comment if you know). Also, don't forget to install GRUB on the new disk so you can still boot when the next disk dies:
$ grub-mkdevicemap $ grub-install /dev/sdbAnd it might be a good idea to use S.M.A.R.T. to check the new disk, just in case. I did a quick run for the new disk via:
$ smartctl -t short /dev/sdb # Wait a few minutes after this. $ smartctl -a /dev/sdb [...] SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 22 - [...]Looks good. So far.
Jan 24 18:16:08 foo kernel: [ 1965.343980] ata5.00: exception Emask 0x50 SAct 0x39 SErr 0x800 action 0x6 frozen Jan 24 18:16:08 foo kernel: [ 1965.343991] ata5.00: irq_stat 0x08000000, interface fatal error Jan 24 18:16:08 foo kernel: [ 1965.344001] ata5: SError: HostInt Jan 24 18:16:08 foo kernel: [ 1965.344036] ata5.00: failed command: READ FPDMA QUEUED Jan 24 18:16:08 foo kernel: [ 1965.344055] ata5.00: cmd 60/08:00:18:31:44/00:00:1a:00:00/40 tag 0 ncq 4096 in Jan 24 18:16:08 foo kernel: [ 1965.344059] res 40/00:2c:e6:c2:fd/00:00:26:00:00/40 Emask 0x50 (ATA bus error) Jan 24 18:16:08 foo kernel: [ 1965.344071] ata5.00: status: DRDY Jan 24 18:16:08 foo kernel: [ 1965.344081] ata5.00: failed command: WRITE FPDMA QUEUEDIt is a good time to think about changing your hardrive -- but it is maybe too late. Smartmontools (aka smartd) is dedicated tool to monitor hard drives and do a good job. I think it is not installed by default in Debian, but it should be. It scans your hard drive for SMART capabilities and monitor the health of the HD using internal tools. In the case of bad blocks, you will start to see entry like that:
Feb 17 14:05:17 bar smartd[1268]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectors Feb 17 14:35:17 bar smartd[1268]: Device: /dev/sda [SAT], 1 Currently unreadable (pending) sectorsObviously when you see this log, it is also a good time to change your hard drive. In the case of my wife's HD, I just got data from logcheck. It means that the error is not that important (transient failure, something is wrong but the HD can cope with it). But I still decided to get a new one for my wife. Whenever, I receive a new drive, the first thing I do is to check it for errors. You can do that using the program badblocks in write mode. It takes ages to test (count up to 1 day for 1TB on USB), but at the end you know that you have a good candidate -- where it is worth install your data. You just have to follow this procedure
The following warning/error was logged by the smartd daemon: Device: /dev/sda [SAT], 15 Currently unreadable (pending) sectorssyslog entries:
Mar 2 20:52:57 foo kernel: [ 8317.419715] ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Mar 2 20:52:57 foo kernel: [ 8317.419724] ata4.00: irq_stat 0x40000008 Mar 2 20:52:57 foo kernel: [ 8317.419732] ata4.00: failed command: READ FPDMA QUEUED Mar 2 20:52:57 foo kernel: [ 8317.419747] ata4.00: cmd 60/80:00:00:af:08/00:00:28:00:00/40 tag 0 ncq 65536 in Mar 2 20:52:57 foo kernel: [ 8317.419750] res 41/40:00:65:af:08/00:00:28:00:00/40 Emask 0x409 (media error) <F> Mar 2 20:52:57 foo kernel: [ 8317.419758] ata4.00: status: DRDY ERR Mar 2 20:52:57 foo kernel: [ 8317.419764] ata4.00: error: UNC Mar 2 20:52:57 foo kernel: [ 8317.423959] ata4.00: configured for UDMA/133I am not blaming any particular brand (like Western Digital), all computer parts I have ever bought had to follow the same procedure and it is a known fact that computer parts as a non-zero percentage of chance to be DOA (dead on arrival) or after a few weeks. But as a consumer you should be aware of that and take action to avoid spending 10h configuring your computer to see it failing after a week... The waste of time to test is a win on the long term.
-c
, -l selftest
,
etc.)smartctl -a
reads a while, and then:
Error SMART Error Self-Test Log Read failed: Input/output error
Smartctl: SMART Self Test Log Read Failed
real 0m39.029s
The timeout above also has generated lots of errors in the drive's
error log. I don't know how to read these properly, but in any case
they don't seem too scary:
Error 144 occurred at disk power-on lifetime: 13552 hours (564 days + 16 hours)
When the command that caused the error occurred, the device was doing SMART Offline or Self-test.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
10 51 00 80 ae 39 40
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 80 70 80 ae 39 1c 08 41d+08:07:45.052 WRITE FPDMA QUEUED
b0 d0 01 00 4f c2 00 08 41d+08:07:45.038 SMART READ DATA
ec 00 01 00 00 00 00 08 41d+08:07:44.958 IDENTIFY DEVICE
2f 00 01 10 00 00 00 08 41d+08:07:44.957 READ LOG EXT
61 80 70 80 ae 39 1c 08 41d+08:07:37.960 WRITE FPDMA QUEUED
For some of the errors, all preceding commands are WRITE FPDMA
QUEUED
, but all are during a "SMART Offline or Self-test" phase.
When a self-test is not being done, reading all the SMART data
(smartctl -a
) is very very quick, taking half a second.
The only thing I can think of is that the drive's own area for storing
SMART data is unhealthy, and reading it takes time, and a concurrent
SMART test and I/O load makes it hard for the drive to do so. But
again, I can't trigger any real I/O error, nor at the beginning of the
drive neither at the end, so
This also happens when the drives is connected to a plain SATA port,
skipping the RAID controller, so it's not just the controller playing
games on me.
I'm really confused now. Given my previous experience, this drive will
die, should already have died, and yet, no I/O errors, just some
timeouts. Do I just need to wait a couple more weeks?
/home/joey/src/ikiwiki/doc/ikiwiki/
/home/joey/src/pdmenu/src/
/home/joey/src/joeywiki/joey/
/lib/modules/2.6.26-2-686/kernel/lib/
/proc/1/task/1/
Then I realized there was a name already: Fractal.
While directory structures are supposedly hierarchical,
different levels in the hierarchy are self-similar, and so
sometimes the same name is duplicated, with a subtly, or
vastly different meaning, at different levels in the same
directory path.
I had wondered if we were doing something wrong, that this happens so
often. It is occasionally confusing. But aside from using a deep,
nominally hierarchical directory structure, and displaying a slice
through it (as if talking about electron orbitals in a rock orbiting sol),
I don't think we're doing anything wrong.
Here is a oneliner that will find all such directories inside
a specified start directory. You will probably find some amusing ones
you've never noticed.
perl -MFile::Find -le 'find(sub if (-d $_) my %bits; foreach $bit (split "/", $File::Find::name) if (++$bits lc($bit) > 1) print $File::Find::name; return , shift ".")'
Apparently that also means I’ve clocked up ten and a half years as a Debian user; I think my previous two years of Linux (mid-95 to mid-97) were split between Slackware and Red Hat, though I couldn’t say for sure at this point. There’s already been a few other grand ten-year reviews, such as Joey Hess’s twenty-part serial, or LWN’s week-by-week review, or ONLamp’s interview with Bruce Perens, Eric Raymond and Michael Tiemann on ten years of “open source”. I don’t think I’m going to try matching that sort of depth though, so here are some of my highlights (after the break).From: Anthony Towns <aj@humbug.org.au> Subject: Wannabe maintainer. Date: Sun, 8 Feb 1998 18:35:28 +1000 (EST) To: new-maintainer@debian.org Hello world, I'd like to become a debian maintainer. I'd like an account on master, and for it to be subscribed to the debian-private list. My preferred login on master would have been aj, but as that's taken ajt or atowns would be great. I've run a debian system at home for half a year, and a system at work for about two months. I've run Linux for two and a half years at home, two years at work. I've been active in my local linux users' group for just over a year. I've written a few programs, and am part way through packaging the distributed.net personal proxy for Debian (pending approval for non-free distribution from distributed.net). I've read the Debian Social Contract. My PGP public key is attached, and also available as <http://azure.humbug.org.au/~aj/aj_key.asc>. If there's anything more you need to know, please email me. Thanks in advance. Cheers, aj -- Anthony Towns <aj@humbug.org.au> <http://azure.humbug.org.au/~aj/> I don't speak for anyone save myself. PGP encrypted mail preferred. On Netscape GPLing their browser: How can you trust a browser that ANYONE can hack? For the secure choice, choose Microsoft.'' -- <oryx@pobox.com> in a comment on slashdot.org
From: Anthony Towns Date: Sat, Nov 21, 1998 There are a few bugs accumulating against the netbase package which you're maintaining. I was wondering if you'd mind if I made an NMU to fix some of them for the upcoming slink release?
From: Peter Tobias Date: Sat, Nov 21, 1998 No, please go ahead ... I'm quite busy right now and I would really appreciate any help. Please let me know if you need additional information about the package.
From: Anthony Towns Date: Sun, Dec 6, 1998 There's an NMU sitting in Incoming now. It fixes a few bugs, viz: [...]
From: Peter Tobias Date: Fri, Dec 25, 1998 due to my current job I don't have much time to work on my debian packages. In order to have more time for my other debian packages I would like to give away the netbase package. Are you interested in maintaining this package?
From: Anthony Towns Date: Fri, Dec 25, 1998 Ummm. Sure. I guess. (or, iow, *Eeeeeeeeeeeeek*!!!)
Next month I sent Darren Benham a first version of bugreport.cgi, and at some point around then must’ve sent off a pkreport.cgi too; by the month after (October) I’d evidently been added to the debbugs group, because I was merging my archived bugs into the official debbugs directories.From: Anthony Towns Subject: BTS and old bugs Date: Tue, 24 Aug 1999 22:33:16 +1000 To: debian-private@lists.debian.org ObPrivate: Erm. I'm not sure. It is only even vaguely relevant to developers. Since bugs #9705 and #36727 don't seem like being fixed any time soon and Darren hasn't managed to convert the BTS to using debbugs.deb yet, I've made a little script to stop us from continuing to lose bug reports, and am running it in my crontab on master. ~ajt/debian-bugs/archive/ contains hardlinked copies of the bugs in ~iwj/debian-bugs/spool/db (except split into sub-directories). When the bugs get expired from the BTS, the hardlink in ~ajt remains, so the file doesn't get lost forever. In the week or so I've been running it, some 500 odd bugs expired [0].
import System.Environment
import Data.List
import Data.Char
import qualified Data.Map as Map
custwords = filter (/= "") . lines . map (conv . toLower)
where iswordchar x = isAlphaNum x && isAscii x
conv x = if iswordchar x then x else '\n'
wordfreq inp = Map.toList $ foldl' updmap (Map.empty::Map.Map String Int) inp
where updmap nm word = case Map.lookup word nm of
Nothing -> Map.insert word 1 nm
Just x -> (Map.insert word $! x + 1) nm
freqsort (w1, c1) (w2, c2) = if c1 == c2
then compare w1 w2
else compare c2 c1
showit (word, count) = show count ++ " " ++ word
main = do args - getArgs
interact $ unlines . map showit . take (read . head $ args) .
sortBy freqsort . wordfreq . custwords
Next.